Search CORE

4,801 research outputs found

Considerations about multistep community detection

Author: A Broder
A Clauset
A Lancichinetti
AL Barabási
BH Good
FD Malliaros
HP Kriegel
J Reichardt
JC Bezdek
L Danon
M Belkin
M Girvan
ME Newman
ME Newman
ME Newman
P Krapivsky
R Kannan
S Fortunato
S Fortunato
TF Chan
VD Blondel
W Zhang
Publication venue
Publication date: 27/02/2014
Field of study

The problem and implications of community detection in networks have raised a huge attention, for its important applications in both natural and social sciences. A number of algorithms has been developed to solve this problem, addressing either speed optimization or the quality of the partitions calculated. In this paper we propose a multi-step procedure bridging the fastest, but less accurate algorithms (coarse clustering), with the slowest, most effective ones (refinement). By adopting heuristic ranking of the nodes, and classifying a fraction of them as `critical', a refinement step can be restricted to this subset of the network, thus saving computational time. Preliminary numerical results are discussed, showing improvement of the final partition.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca- Università del Salento

An automatic method to generate domain-specific investigator networks using PubMed abstracts

Author: Ajay Yesupriya
Anja Wulf
BK Lin
DA Lindberg
FS Collins
JA Kremer
JP Ioannidis
JP Ioannidis
Junfeng Qu
Marta Gwinn
ME Newman
ME Newman
ME Newman
ME Newman
MJ Khoury
Muin J Khoury
O Tutarel
S Teasley
Wei Yu
Publication venue: BioMed Central
Publication date: 01/06/2007
Field of study

Abstract Background Collaboration among investigators has become critical to scientific research. This includes ad hoc collaboration established through personal contacts as well as formal consortia established by funding agencies. Continued growth in online resources for scientific research and communication has promoted the development of highly networked research communities. Extending these networks globally requires identifying additional investigators in a given domain, profiling their research interests, and collecting current contact information. We present a novel strategy for building investigator networks dynamically and producing detailed investigator profiles using data available in PubMed abstracts. Results We developed a novel strategy to obtain detailed investigator information by automatically parsing the affiliation string in PubMed records. We illustrated the results by using a published literature database in human genome epidemiology (HuGE Pub Lit) as a test case. Our parsing strategy extracted country information from 92.1% of the affiliation strings in a random sample of PubMed records and in 97.0% of HuGE records, with accuracies of 94.0% and 91.0%, respectively. Institution information was parsed from 91.3% of the general PubMed records (accuracy 86.8%) and from 94.2% of HuGE PubMed records (accuracy 87.0). We demonstrated the application of our approach to dynamic creation of investigator networks by creating a prototype information system containing a large database of PubMed abstracts relevant to human genome epidemiology (HuGE Pub Lit), indexed using PubMed medical subject headings converted to Unified Medical Language System concepts. Our method was able to identify 70–90% of the investigators/collaborators in three different human genetics fields; it also successfully identified 9 of 10 genetics investigators within the PREBIC network, an existing preterm birth research network. Conclusion We successfully created a web-based prototype capable of creating domain-specific investigator networks based on an application that accurately generates detailed investigator profiles from PubMed abstracts combined with robust standard vocabularies. This approach could be used for other biomedical fields to efficiently establish domain-specific investigator networks.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Outlier Edge Detection Using Random Graph Generation Models and Applications

Author: A Lancichinetti
AK Jain
DJ Watts
G Karypis
H Zhang
J Leskovec
J Shi
J Yang
L Akoglu
L Danon
L Danon
L Liu
L Lu
L Waltman
LC Freeman
M Choudhury De
M Coscia
M Newman
M Rosvall
ME Newman
ME Newman
MEJ Newman
MR Brito
R Yu
S Fortunato
S Lloyd
S Papadopoulos
SE Schaeffer
VD Blondel
VJ Hodge
X Dong
Publication venue
Publication date: 21/06/2016
Field of study

Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. Further more, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: 1) a preprocessing tool that improves the performance of graph clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape

arXiv.org e-Print Archive

Qatar University Institutional Repository

Crossref

Directory of Open Access Journals

Trepo - Institutional Repository of Tampere University

Network 'small-world-ness': a quantitative method for determining canonical network equivalence

Author: A Barrat
A Ozgur
A Roxin
A Wagner
AL Barabasi
B Efron
D Lusseau
DE Knuth
DJ Watts
DJ Watts
DS Bassett
H Ebel
H Jeong
H Jeong
J Saramaki
JG White
K Klemm
Kevin Gurney
L Tian
LA Adamic
LA Amaral
LF Lago-Fernandez
LR Little
M Barahona
M Bollobas
M Faloutsos
M Huxham
M Kaiser
MA Janssen
MA Stephens
Mark D. Humphries
MD Humphries
ME Newman
ME Newman
ME Newman
ME Newman
MEJ Newman
MEJ Newman
MEJ Newman
MEJ Newman
MJ Conyon
MJ Keeling
ND Martinez
O Sporns
Olaf Sporns
P Sen
PS Bearman
R Albert
R Cohen
R de Castro
R Khanin
R Milo
RF Cancho
RJ Prill
S Achard
S Boccaletti
S Delre
S Valverde
T Nishikawa
TI Netoff
V Braitenberg
V Latora
VP Zhigulin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

Background: Many technological, biological, social, and information networks fall into the broad class of 'small-world' networks: they have tightly interconnected clusters of nodes, and a shortest mean path length that is similar to a matched random graph (same number of nodes and edges). This semi-quantitative definition leads to a categorical distinction ('small/not-small') rather than a quantitative, continuous grading of networks, and can lead to uncertainty about a network's small-world status. Moreover, systems described by small-world networks are often studied using an equivalent canonical network model-the Watts-Strogatz (WS) model. However, the process of establishing an equivalent WS model is imprecise and there is a pressing need to discover ways in which this equivalence may be quantified. Methodology/Principal Findings: We defined a precise measure of 'small-world-ness' S based on the trade off between high local clustering and short path length. A network is now deemed a 'small-world' if S. 1-an assertion which may be tested statistically. We then examined the behavior of S on a large data-set of real-world systems. We found that all these systems were linked by a linear relationship between their S values and the network size n. Moreover, we show a method for assigning a unique Watts-Strogatz (WS) model to any real-world network, and show analytically that the WS models associated with our sample of networks also show linearity between S and n. Linearity between S and n is not, however, inevitable, and neither is S maximal for an arbitrary network of given size. Linearity may, however, be explained by a common limiting growth process. Conclusions/Significance: We have shown how the notion of a small-world network may be quantified. Several key properties of the metric are described and the use of WS canonical models is placed on a more secure footing

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

White Rose Research Online

Seeds Buffering for Information Spreading Processes

Author: A Kumar
A Wald
C Wang
D Kempe
D Siegmund
DJ Watts
E Bulut
F Morone
FP Lange de
GJ Jakab
J Jankowski
J Leskovec
M Granovetter
M Kitsak
ME Newman
ME Newman
R Michalski
R Michalski
S Sridhar
T Opsahl
T Opsahl
WH Price
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/09/2017
Field of study

Seeding strategies for influence maximization in social networks have been studied for more than a decade. They have mainly relied on the activation of all resources (seeds) simultaneously in the beginning; yet, it has been shown that sequential seeding strategies are commonly better. This research focuses on studying sequential seeding with buffering, which is an extension to basic sequential seeding concept. The proposed method avoids choosing nodes that will be activated through the natural diffusion process, which is leading to better use of the budget for activating seed nodes in the social influence process. This approach was compared with sequential seeding without buffering and single stage seeding. The results on both real and artificial social networks confirm that the buffer-based consecutive seeding is a good trade-off between the final coverage and the time to reach it. It performs significantly better than its rivals for a fixed budget. The gain is obtained by dynamic rankings and the ability to detect network areas with nodes that are not yet activated and have high potential of activating their neighbours.Comment: Jankowski, J., Br\'odka, P., Michalski, R., & Kazienko, P. (2017, September). Seeds Buffering for Information Spreading Processes. In International Conference on Social Informatics (pp. 628-641). Springe

arXiv.org e-Print Archive

Crossref

The Routing of Complex Contagion in Kleinberg's Small-World Networks

Author: D Centola
D Easley
D Kempe
DJ Watts
GJ Baxter
J Adler
J Balogh
J Balogh
J Chalupa
M Granovetter
M Granovetter
ME Newman
MEJ Newman
MS Granovetter
R Motwani
S Milgram
Publication venue
Publication date: 11/05/2016
Field of study

In Kleinberg's small-world network model, strong ties are modeled as deterministic edges in the underlying base grid and weak ties are modeled as random edges connecting remote nodes. The probability of connecting a node

u

with node

v

through a weak tie is proportional to

1/|uv|^\alpha

, where

|uv|

is the grid distance between

u

and

v

and

\alpha\ge 0

is the parameter of the model. Complex contagion refers to the propagation mechanism in a network where each node is activated only after

k \ge 2

neighbors of the node are activated. In this paper, we propose the concept of routing of complex contagion (or complex routing), where we can activate one node at one time step with the goal of activating the targeted node in the end. We consider decentralized routing scheme where only the weak ties from the activated nodes are revealed. We study the routing time of complex contagion and compare the result with simple routing and complex diffusion (the diffusion of complex contagion, where all nodes that could be activated are activated immediately in the same step with the goal of activating all nodes in the end). We show that for decentralized complex routing, the routing time is lower bounded by a polynomial in

n

(the number of nodes in the network) for all range of

\alpha

both in expectation and with high probability (in particular,

\Omega(n^{\frac{1}{\alpha+2}})

for

\alpha \le 2

and

\Omega(n^{\frac{\alpha}{2(\alpha+2)}})

for

\alpha > 2

in expectation), while the routing time of simple contagion has polylogarithmic upper bound when

\alpha = 2

. Our results indicate that complex routing is harder than complex diffusion and the routing time of complex contagion differs exponentially compared to simple contagion at sweetspot.Comment: Conference version will appear in COCOON 201

arXiv.org e-Print Archive

Crossref

Semi-Supervised Overlapping Community Finding based on Label Propagation with Pairwise Constraints

Author: A Amelio
A Clauset
A Lancichinetti
A Lancichinetti
A Lancichinetti
D Liu
M Girvan
ME Newman
S Fortunato
V Blondel
YY Ahn
ZY Zhang
Publication venue
Publication date: 17/10/2018
Field of study

Algorithms for detecting communities in complex networks are generally unsupervised, relying solely on the structure of the network. However, these methods can often fail to uncover meaningful groupings that reflect the underlying communities in the data, particularly when those structures are highly overlapping. One way to improve the usefulness of these algorithms is by incorporating additional background information, which can be used as a source of constraints to direct the community detection process. In this work, we explore the potential of semi-supervised strategies to improve algorithms for finding overlapping communities in networks. Specifically, we propose a new method, based on label propagation, for finding communities using a limited number of pairwise constraints. Evaluations on synthetic and real-world datasets demonstrate the potential of this approach for uncovering meaningful community structures in cases where each node can potentially belong to more than one community.Comment: Fix table

arXiv.org e-Print Archive

Crossref

Research Repository UCD

Structural Properties of Ego Networks

Author: A Vázquez
AL Barabási
C Amuedo-Dorantes
M Boguñá
MA Serrano
ME Newman
ME Newman
NA Christakis
NO Hodas
R Albert
S Milgram
SL Feld
WW Zachary
Publication venue
Publication date: 18/01/2015
Field of study

The structure of real-world social networks in large part determines the evolution of social phenomena, including opinion formation, diffusion of information and influence, and the spread of disease. Globally, network structure is characterized by features such as degree distribution, degree assortativity, and clustering coefficient. However, information about global structure is usually not available to each vertex. Instead, each vertex's knowledge is generally limited to the locally observable portion of the network consisting of the subgraph over its immediate neighbors. Such subgraphs, known as ego networks, have properties that can differ substantially from those of the global network. In this paper, we study the structural properties of ego networks and show how they relate to the global properties of networks from which they are derived. Through empirical comparisons and mathematical derivations, we show that structural features, similar to static attributes, suffer from paradoxes. We quantify the differences between global information about network structure and local estimates. This knowledge allows us to better identify and correct the biases arising from incomplete local information.Comment: Accepted by SBP 2015, to appear in the proceeding

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Parameterized Complexity of Centrality Improvement in Networks

Author: A Boral
AM Ambalath
D Eppstein
D Lokshtanov
DR White
F Hüffner
G D’Angelo
K Okamoto
LC Freeman
LC Freeman
M Cygan
M Newman
M Rubinov
ME Newman
P Crescenzi
P Csermely
R Diestel
T Opsahl
U Brandes
U Brandes
Publication venue
Publication date: 04/10/2017
Field of study

The centrality of a vertex v in a network intuitively captures how important v is for communication in the network. The task of improving the centrality of a vertex has many applications, as a higher centrality often implies a larger impact on the network or less transportation or administration cost. In this work we study the parameterized complexity of the NP-complete problems Closeness Improvement and Betweenness Improvement in which we ask to improve a given vertex' closeness or betweenness centrality by a given amount through adding a given number of edges to the network. Herein, the closeness of a vertex v sums the multiplicative inverses of distances of other vertices to v and the betweenness sums for each pair of vertices the fraction of shortest paths going through v. Unfortunately, for the natural parameter "number of edges to add" we obtain hardness results, even in rather restricted cases. On the positive side, we also give an island of tractability for the parameter measuring the vertex deletion distance to cluster graphs

arXiv.org e-Print Archive

Crossref